Goto

Collaborating Authors

 traversable region


SocialNav: Training Human-Inspired Foundation Model for Socially-Aware Embodied Navigation

Chen, Ziyi, Guo, Yingnan, Chu, Zedong, Luo, Minghua, Shen, Yanfen, Sun, Mingchao, Hu, Junjun, Xie, Shichao, Yang, Kuan, Shi, Pei, Gu, Zhining, Liu, Lu, Han, Honglin, Wu, Xiaolong, Xu, Mu, Zhang, Yu

arXiv.org Artificial Intelligence

Embodied navigation that adheres to social norms remains an open research challenge. Our \textbf{SocialNav} is a foundational model for socially-aware navigation with a hierarchical "brain-action" architecture, capable of understanding high-level social norms and generating low-level, socially compliant trajectories. To enable such dual capabilities, we construct the SocNav Dataset, a large-scale collection of 7 million samples, comprising (1) a Cognitive Activation Dataset providing social reasoning signals such as chain-of-thought explanations and social traversability prediction, and (2) an Expert Trajectories Pyramid aggregating diverse navigation demonstrations from internet videos, simulated environments, and real-world robots. A multi-stage training pipeline is proposed to gradually inject and refine navigation intelligence: we first inject general navigation skills and social norms understanding into the model via imitation learning, and then refine such skills through a deliberately designed Socially-Aware Flow Exploration GRPO (SAFE-GRPO), the first flow-based reinforcement learning framework for embodied navigation that explicitly rewards socially compliant behaviors. SocialNav achieves +38% success rate and +46% social compliance rate compared to the state-of-the-art method, demonstrating strong gains in both navigation performance and social compliance. Our project page: https://amap-eai.github.io/SocialNav/


An Onboard Framework for Staircases Modeling Based on Point Clouds

Qing, Chun, Zeng, Rongxiang, Wu, Xuan, Shi, Yongliang, Ma, Gan

arXiv.org Artificial Intelligence

The detection of traversable regions on staircases and the physical modeling constitutes pivotal aspects of the mobility of legged robots. This paper presents an onboard framework tailored to the detection of traversable regions and the modeling of physical attributes of staircases by point cloud data. To mitigate the influence of illumination variations and the overfitting due to the dataset diversity, a series of data augmentations are introduced to enhance the training of the fundamental network. A curvature suppression cross-entropy(CSCE) loss is proposed to reduce the ambiguity of prediction on the boundary between traversable and non-traversable regions. Moreover, a measurement correction based on the pose estimation of stairs is introduced to calibrate the output of raw modeling that is influenced by tilted perspectives. Lastly, we collect a dataset pertaining to staircases and introduce new evaluation criteria. Through a series of rigorous experiments conducted on this dataset, we substantiate the superior accuracy and generalization capabilities of our proposed method. Codes, models, and datasets will be available at https://github.com/szturobotics/Stair-detection-and-modeling-project.


Similar but Different: A Survey of Ground Segmentation and Traversability Estimation for Terrestrial Robots

Lim, Hyungtae, Oh, Minho, Lee, Seungjae, Ahn, Seunguk, Myung, Hyun

arXiv.org Artificial Intelligence

With the increasing demand for mobile robots and autonomous vehicles, several approaches for long-term robot navigation have been proposed. Among these techniques, ground segmentation and traversability estimation play important roles in perception and path planning, respectively. Even though these two techniques appear similar, their objectives are different. Ground segmentation divides data into ground and non-ground elements; thus, it is used as a preprocessing stage to extract objects of interest by rejecting ground points. In contrast, traversability estimation identifies and comprehends areas in which robots can move safely. Nevertheless, some researchers use these terms without clear distinction, leading to misunderstanding the two concepts. Therefore, in this study, we survey related literature and clearly distinguish ground and traversable regions considering four aspects: a) maneuverability of robot platforms, b) position of a robot in the surroundings, c) subset relation of negative obstacles, and d) subset relation of deformable objects.


Contrastive Label Disambiguation for Self-Supervised Terrain Traversability Learning in Off-Road Environments

Xue, Hanzhang, Hu, Xiaochang, Xie, Rui, Fu, Hao, Xiao, Liang, Nie, Yiming, Dai, Bin

arXiv.org Artificial Intelligence

Discriminating the traversability of terrains is a crucial task for autonomous driving in off-road environments. However, it is challenging due to the diverse, ambiguous, and platform-specific nature of off-road traversability. In this paper, we propose a novel self-supervised terrain traversability learning framework, utilizing a contrastive label disambiguation mechanism. Firstly, weakly labeled training samples with pseudo labels are automatically generated by projecting actual driving experiences onto the terrain models constructed in real time. Subsequently, a prototype-based contrastive representation learning method is designed to learn distinguishable embeddings, facilitating the self-supervised updating of those pseudo labels. As the iterative interaction between representation learning and pseudo label updating, the ambiguities in those pseudo labels are gradually eliminated, enabling the learning of platform-specific and task-specific traversability without any human-provided annotations. Experimental results on the RELLIS-3D dataset and our Gobi Desert driving dataset demonstrate the effectiveness of the proposed method.


Learning Off-Road Terrain Traversability with Self-Supervisions Only

Seo, Junwon, Sim, Sungdae, Shim, Inwook

arXiv.org Artificial Intelligence

Estimating the traversability of terrain should be reliable and accurate in diverse conditions for autonomous driving in off-road environments. However, learning-based approaches often yield unreliable results when confronted with unfamiliar contexts, and it is challenging to obtain manual annotations frequently for new circumstances. In this paper, we introduce a method for learning traversability from images that utilizes only self-supervision and no manual labels, enabling it to easily learn traversability in new circumstances. To this end, we first generate self-supervised traversability labels from past driving trajectories by labeling regions traversed by the vehicle as highly traversable. Using the self-supervised labels, we then train a neural network that identifies terrains that are safe to traverse from an image using a one-class classification algorithm. Additionally, we supplement the limitations of self-supervised labels by incorporating methods of self-supervised learning of visual representations. To conduct a comprehensive evaluation, we collect data in a variety of driving environments and perceptual conditions and show that our method produces reliable estimations in various environments. In addition, the experimental results validate that our method outperforms other self-supervised traversability estimation methods and achieves comparable performances with supervised learning methods trained on manually labeled data.


LiDAR Road-Atlas: An Efficient Map Representation for General 3D Urban Environment

Wu, Banghe, Xu, Chengzhong, Kong, Hui

arXiv.org Artificial Intelligence

In this work, we propose the LiDAR Road-Atlas, a compactable and efficient 3D map representation, for autonomous robot or vehicle navigation in general urban environment. The LiDAR Road-Atlas can be generated by an online mapping framework based on incrementally merging local 2D occupancy grid maps (2D-OGM). Specifically, the contributions of our LiDAR Road-Atlas representation are threefold. First, we solve the challenging problem of creating local 2D-OGM in non-structured urban scenes based on a real-time delimitation of traversable and curb regions in LiDAR point cloud. Second, we achieve accurate 3D mapping in multiple-layer urban road scenarios by a probabilistic fusion scheme. Third, we achieve very efficient 3D map representation of general environment thanks to the automatic local-OGM induced traversable-region labeling and a sparse probabilistic local point-cloud encoding. Given the LiDAR Road-Atlas, one can achieve accurate vehicle localization, path planning and some other tasks. Our map representation is insensitive to dynamic objects which can be filtered out in the resulting map based on a probabilistic fusion. Empirically, we compare our map representation with a couple of popular map representation methods in robotics and autonomous driving societies, and our map representation is more favorable in terms of efficiency, scalability and compactness. In addition, we also evaluate localization accuracy extensively given the created LiDAR Road-Atlas representations on several public benchmark datasets. With a 16-channel LiDAR sensor, our method achieves an average global localization errors of 0.26m (translation) and 1.07 degrees (rotation) on the Apollo dataset, and 0.89m (translation) and 1.29 degrees (rotation) on the MulRan dataset, respectively, at 10Hz, which validates the promising performance of our map representation for autonomous driving.


Monocular Camera-based Complex Obstacle Avoidance via Efficient Deep Reinforcement Learning

Ding, Jianchuan, Gao, Lingping, Liu, Wenxi, Piao, Haiyin, Pan, Jia, Du, Zhenjun, Yang, Xin, Yin, Baocai

arXiv.org Artificial Intelligence

Abstract--Deep reinforcement learning has achieved great success in laser-based collision avoidance works because the laser can sense accurate depth information without too much redundant data, which can maintain the robustness of the algorithm when it is migrated from the simulation environment to the real world. However, high-cost laser devices are not only difficult to deploy for a large scale of robots but also demonstrate unsatisfactory robustness towards the complex obstacles, including irregular obstacles, e.g., tables, chairs, and shelves, as well as complex ground and special materials. In this paper, we propose a novel monocular camera-based complex obstacle avoidance framework. Particularly, we innovatively transform the captured RGB images to pseudo-laser measurements for efficient deep reinforcement learning. Compared to the traditional laser measurement captured at a certain height that only contains one-dimensional distance information away from the neighboring obstacles, our proposed pseudo-laser measurement fuses the depth and semantic information of the captured RGB image, which makes our method effective for complex obstacles. We also design a feature extraction guidance module to weight the input pseudo-laser measurement, and the agent has more reasonable attention for the current state, which is conducive to improving the accuracy and efficiency of the obstacle avoidance policy. Besides, we adaptively add the synthesized noise to the laser measurement during the training stage to decrease the simto-real gap and increase the robustness of our model in the real environment. Finally, the experimental results show that our framework achieves state-of-the-art performance in several virtual and real-world scenarios. J. Ding is with the School of Computer Science, Dalian University Figure 1. One-dimensional laser sensors have low robustness to the complex of Technology, Dalian 116024, China, and also with Hebei University of obstacles of certain types.


OFFSEG: A Semantic Segmentation Framework For Off-Road Driving

Viswanath, Kasi, Singh, Kartikeya, Jiang, Peng, B., Sujit P., Saripalli, Srikanth

arXiv.org Artificial Intelligence

Off-road image semantic segmentation is challenging due to the presence of uneven terrains, unstructured class boundaries, irregular features and strong textures. These aspects affect the perception of the vehicle from which the information is used for path planning. Current off-road datasets exhibit difficulties like class imbalance and understanding of varying environmental topography. To overcome these issues we propose a framework for off-road semantic segmentation called as OFFSEG that involves (i) a pooled class semantic segmentation with four classes (sky, traversable region, non-traversable region and obstacle) using state-of-the-art deep learning architectures (ii) a colour segmentation methodology to segment out specific sub-classes (grass, puddle, dirt, gravel, etc.) from the traversable region for better scene understanding. The evaluation of the framework is carried out on two off-road driving datasets, namely, RELLIS-3D and RUGD. We have also tested proposed framework in IISERB campus frames. The results show that OFFSEG achieves good performance and also provides detailed information on the traversable region.